Lossy statistical data compression

ABSTRACT

A method performed in real-time includes receiving and storing time-based data over a specific time period and dividing the specific time period into a plurality of time windows. The method further includes determining that data associated with two or more proximate time windows are within a predetermined variance of one another and responsive to the determination: generating a mathematical function representative of the data associated with the two or more proximate time windows, deleting the data associated with the two or more proximate time windows, and generating a representation of the deleted data from the mathematical function. In certain embodiments, the data comprises empirical network telemetry data.

TECHNICAL FIELD

The present disclosure is directed to systems and methods of datacompression and, more particularly to systems and methods of lossystatistical data compression.

BACKGROUND

Data is a valuable commodity in any number of fields. Data analysisperformed on stored data can provide valuable insight that can be usedto make informed decisions, create machine learning models, or trainartificial intelligence systems. However, the amount of data availableis growing every day as more and more devices are connected to theInternet to become “smart.” Correspondingly, the costs associated withthe storage of that data have grown. Today, mid-size organizations areworking with Petabytes of stored data while larger organizations aredealing with Exabytes of stored data and it is estimated that by theyear 2025 there will be 175 Zettabytes of stored data around the world.As the amount of data generated and stored continues to growexponentially, there is a need to ensure data is processed and stored inan efficient and cost-effective manner. It is with respect to these andother general considerations that the present disclosure is made.

SUMMARY

Lossy statistical data compression of the present disclosure provides anefficient and cost-effective manner of processing and storing growingmountains of data, e.g., “Big Data,” by encapsulating time-bounded datainto mathematically modelled empirical functions. Stored data that canbe represented by these one or more functions can be eliminated in favorof the functions thereby reducing the amount of storage needed and itsassociated cost.

In the certain aspects, the present disclosure is directed to a methodof lossy statistical data compression. The method includes receivingtime-based empirical data for a defined period of time and storing thetime-based empirical data in a memory storage device. The method furtherincludes dividing the specific period of time into a plurality of timewindows and performing a statistical empirical distribution analysis onthe empirical data associated with a first one of the plurality of timewindows to obtain a first analysis result along with performing astatistical empirical distribution analysis on the empirical dataassociated with a next one of the plurality of time windows to obtain asecond analysis result. The method further includes determining thatempirical data associated with the first one of the time windows iswithin a threshold variance of the empirical data associated with thenext one, e.g., the second, of the time windows, and responsive to thatdetermination associating the first analysis result with the second timewindow and deleting from the memory device the empirical data associatedwith the second time window. The method further includes generating arepresentation of the empirical data associated with the first andsecond time windows from the first analysis result.

In certain aspects, the method is performed in real-time, nearreal-time, or offline (on-demand). In certain aspects the empirical datacomprises empirical network telemetry data. In certain aspects themethod additionally includes deleting the empirical data associated withthe first time window. In certain aspects, the data being stored anddeleted comprises “Big Data.” In certain aspects, generating therepresentation of the empirical data includes generating a graphicalrepresentation of the empirical data. In certain aspects, the methodfurther includes analyzing the representation of the data to obtain datarelevant to a predetermined desired insight. In certain embodiments, thepredetermined desired insight includes one or both of network capacityplanning and the occurrence of peak and non-peak network traffic timeperiods. In certain aspects, the method further includes determiningthat the empirical data associated with the first time window is notwithin the threshold variance of the empirical data associated with thesecond of the time windows and, responsive to this determination,defining the second time window as a new first time window and definingthe second analysis result as a new first analysis result, thenperforming a statistical empirical distribution analysis on theempirical data associated with the new first time window to obtain a newfirst analysis result and performing a statistical empiricaldistribution analysis on the empirical data associated with a next oneof the plurality of time windows to obtain a second analysis result.

In certain aspects the present disclosure is directed to a methodperformed in real-time or on demand that includes receiving and storingtime-based data over a specific time period and dividing the specifictime period into a plurality of time windows. The method furtherincludes determining that data associated with two or more proximatetime windows are within a predetermined variance of one another andresponsive to the determination: generating a mathematical function(e.g., perform statistical empirical distribution analysis)representative of the data associated with the two or more proximatetime windows, deleting the data associated with the two or moreproximate time windows, and generating a representation of the deleteddata from the mathematical function.

This Summary is provided to introduce a selection of concepts insimplified form that are further described in the Detailed Description.This Summary is not intended to identify key features or essentialfeatures of the claimed subject matter, nor is it intended to be used tolimit the scope of the claimed subject matter. Additional aspects,features and/or advantages of the concepts can be appreciated from theDetailed Description provided below.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive examples are described with reference tothe following figures.

FIG. 1 is a schematic of an example architecture for implementing lossystatistical data compression according to the present disclosure.

FIG. 2 is a flowchart illustrating a method for lossy statistical datacompression.

FIG. 3 is simplified example of a graphical representation generatedfrom the results of statistical empirical data analysis under lossystatistical data compression.

FIG. 4 is an example simplified example of a graphical representationgenerated from the results of statistical empirical data analysis underlossy statistical data compression.

FIG. 5 is an example simplified example of a graphical representationgenerated from the results of statistical empirical data analysis underlossy statistical data compression.

FIG. 6 is a block diagram illustrating example physical components of acomputing device with which aspects of the disclosure may be practiced.

DETAILED DESCRIPTION

Various aspects of lossy statistical data compression are described morefully below with reference to the accompanying drawings. It should beappreciated, however, that numerous variations on the disclosed aspectsare possible as can be appreciated by those skilled in the art. Thevarious aspects of lossy statistical data compression may be practicedin different forms including methods, systems, or devices. Further, thevarious aspects of lossy statistical data compression may take the formof a hardware implementation, a software implementation or animplementation combining both hardware and software.

An example of the application of lossy statistical data compression isprovided herein in the context of network telemetry. However, it isunderstood that the described lossy statistical data compression can beapplied to any time-based data. Specific examples of additionalapplications where lossy statistical data compression may be appliedinclude, but are not limited to, network traffic pattern identificationand predictions that can be used to detect and prevent outages, longterm data storage, and large data migrations.

Lossy statistical data compression of the present disclosure provides anefficient and cost-effective manner of processing and storing growingmountains of data, e.g., “Big Data,” by encapsulating time-bounded datainto mathematically modelled empirical functions. Stored data that canbe represented by these one or more functions can be eliminated in favorof the functions thereby reducing the amount of storage needed and itsassociated cost.

Accordingly, the lossy statistical data compression of the presentdisclosure provides a plurality of technical benefits including but notlimited to: a reduction in the amount of data that needs to bepersistently stored, a reduction in the amount of data storage spaceused, a reduction in cost associated with the storage space (e.g., lessspace used=less cost), and an ability to reliably produce arepresentation of data without having to use the actual data.

FIG. 1 illustrates a simplified overview of an example architecture 100for implementing lossy statistical data compression according to thepresent disclosure. As shown, the example architecture 100 is presentedin the context of real-time or near real-time network telemetry whererouters, switches and firewalls continuously push data related to thenetwork's health to a centralized location for storage and analysis. Inthe illustrated example, data acquisition occurs via a router 110 thathandles two-way data traffic from a plurality of destinations includingthe Internet 112, a first remote site 114, a second remote site 116 anda local area network (LAN) 118. The router 110 is typically a backbonerouter or router using Netflow™, which executes network flow monitoringtechnology to acquire network telemetry data which is pushed to one ormore computing devices 120 (e.g., a server computing device) for storagein one or more memory storage devices 122.

The network telemetry data, which is acquired and stored relative totime, can include, for example, attributes of Internet Protocol (IP)packets such as IP source address, IP destination address, source port,destination port, layer 3 protocol type, class of service, router orswitch interface, and/or type of service byte (ToS Byte); theacquisition and storage of other network telemetry data is alsopossible. Any suitable manner of storing the acquired network telemetrydata can be used including, for example, flat file storage (e.g., aplain text database), real-time distributed event storage and messagingsystems such as Kafka™ or RabbitMQ™, and distributed data storage suchas provided by Apache Hadoop™, which can be queried by Apache Hive™;other manners of data storage are also possible.

Once stored, an application for lossy statistical data compression 123,described in further details with reference to FIG. 2, is executed byone or more computing devices 124 (which can be the same or differentfrom the one or more computing devices 120) and applied to the storednetwork telemetry data such that at least a portion of the data isrepresented by one or more mathematically modelled empirical functionsprompting elimination (e.g., deletion from memory as represented bytrash bin 126) of the represented data and resulting in a reduced amountof storage required, as represented the reduced number of memory storagedevices 128 (which can be the same or different from the memory storagedevices 122). In certain embodiments, the lossy statistical datacompression is performed in real-time or near real time, but it is alsopossible to perform the lossy statistical data compression on demand. Incertain embodiments, the lossy statistical data compression is executedthrough use of instructions coded in an analytics engine such as ApacheSpark™, Spark Streaming™ and/or Apache Druid™; other analytics enginesmay also be used.

The compressed data is then decompressed, for example, by adecompression application 130 on a client computing device 132, foruser-analysis wherein a representation of the original, and noweliminated data, is reproduced from the mathematically modelledempirical functions along with any data that was not suited tomathematical modeling. In the context of network telemetry, analysis ofthe decompressed data is used, for example, to obtain insight intotraffic and bandwidth usage for informed network capacity planningand/or to obtain insight into when peak and non-peak traffic timeperiods occur; data analysis for other uses may also be performed.

Referring now to the flowchart of FIG. 2, a method 200 for lossystatistical data compression can be appreciated. As shown, the method200 includes receiving time-based data, S210, with each point in timeassociated with a single data point; the data received is considered“Big Data” in that the amount and velocity of data is, for examplethousands of datapoints per hour. It should be noted that time can bemeasured in any suitable increment to provide a desired insight to thedata including, for example, microseconds, seconds, minutes, hours,days, months, and years or any division thereof.

The method 200 further includes setting or determining a length of timeto be associated with the time window and/or setting or determining anumber of time windows into which the time period is to be divided,S212. For example, a time window (ti) of 10 seconds can be selected foran overall time period of five minutes or a 12 hour time window (ti) canbe selected for an overall time period of 30 days. In another example,an overall time period is divided into 15 time windows. In certainembodiments, the number of time windows into which the overall timeperiod is to be divided is a pre-determined number while in otherembodiments a user entry (e.g., manual entry of number or selection froma menu) decides the number of time windows into which the overall timeperiod is to be divided. In certain embodiments, rather than a number ofwindows, or in addition thereto, a length of time of the time window ispredetermined or user-entered. In certain embodiments, each of the timewindows is of a consistent length while in other embodiments at leasttwo of the time windows are of different lengths of time.

Continuing with method 200, a statistical empirical distributionanalysis is executed on the data associated with the first time window(ti) and a result of the analysis (Yi) is generated, S214. In statisticsan empirical distribution function is the distribution functionassociated with an empirical measure of observed data. This cumulativedistribution function is a step function that jumps up by 1/n at each ofthe n data points. Its value at any specified value of the measuredvariable is a fraction of the observations of the measured variable thatare less than or equal to the specified value. The empiricaldistribution function is an estimate of the cumulative distributionfunction that generated the data points in the sample. It converges withprobability 1 to that underlying distribution.

Further, the method includes incrementing the time window, e.g. (ti+1),to obtain the next time window and the data associated therewith, S216.The statistical empirical distribution analysis is executed on the dataassociated with the next time window (ti+1) to generate results of(Yi+1), S218. The data associated with time window (ti) is then comparedto the data associated with time window (ti+1) to determine if the dataassociated with time window (ti) is within a predetermined variancethreshold of the data associated with (ti+1), for example, the dataassociated with (ti) is within ±1%, ±5 or ±10% of (ti+1), S220; othervariance thresholds are also possible. The data associated with timewindow (ti) being within the predetermined variance threshold of thedata associated with time window (ti+1) indicates the data sets aresuited for a best fit analysis.

If the data associated with time window (ti) is not within apredetermined variance threshold of the data associated with time window(ti+1), S220:NO, all data associated with the time window (ti) andresult (Yi) as well as all data associated with the time window (ti+1)and result (Yi+1) is maintained and the result (Yi+1) is designated asthe new result (Yi), S222.

If the data associated with time window (ti) is within a predeterminedvariance threshold of the data associated with time window (ti+1),S220:YES, then, the result (Yi) is saved to memory along with itsassociated data while the data associated with the result (Yi+1) iseliminated (e.g., erased or deleted from memory) as it can berepresented with the result (Yi). Further, the time window (ti+1) isdesignated as the new (ti), S224.

The method 200 continues from either S222 or S224, where it isdetermined if there is additional data within the data sample that hasnot been subjected to the empirical data analysis, S226. If there isadditional data to be analyzed, S226:YES, control returns to S216 toincrement to the next time window (ti+1) and its associated data forexecution of the statistical empirical distribution analysis, S218.

If there is no further data to subject to the statistical empirical dataanalysis, S226:NO, any results (Yi) not yet stored to memory are storedand any remaining data that is represented by one or more results (Yi)of the statistical empirical data analysis is deleted from memoryfreeing additional memory space, S228.

Continuing with the method 200, the one or more results (Yi) of thestatistical empirical data analysis for the various time windows, alongwith any data not represented by a statistical empirical data analysisresult (Yi), are used to generate a representation of the original data,S230. Optionally, the generated representation of the data can be usedfor analysis and/or for generating a graphical presentation of therepresentation of the original data, S232. For example, based on apredetermined desired insight, the representation of the original datais analyzed to obtain data relevant to the predetermined desiredinsight.

Accordingly, at least a portion of the originally obtained data has beencompressed into one or more statistical empirical data analysis results(Yi), in the form of one or more mathematical functions, and memorystorage previously occupied by the original data has been freedresponsive to the execution of the method 200 for lossy statistical datacompression.

FIG. 3 illustrates simplified example of a graphical presentationgenerated from the results of the statistical empirical data analysis.In this instance, observed thresholds for each minute of a 1000 minutetime interval are represented by a single statistical empirical dataanalysis result (Yi), e.g., a line, in the form of y=0.1*n*x²+3.106,where n is the number of observations.

FIG. 4 is an example of a graphical representation generated from theresults of the statistical empirical data analysis under lossystatistical data compression. In this instance the original data,indicated with a solid line, represents a quantity of network trafficover time measured in hours. A first statistical empirical data analysisresult obtained under lossy statistical data compression is used togenerate data representative of a first portion of the original data.Similarly, second, third and fourth empirical data analysis results,g(x), h(x), and i(x), respectively, obtained under lossy statisticaldata compression is used to generate data representative of second,third and fourth portions of the original data. Further, the originaldata that supported the generation of f(x), g(x), h(x) and i(x) has beendeleted from memory.

FIG. 5 is another example of a graphical representation generated fromthe results of the statistical empirical data analysis under lossystatistical data compression. Once again, the original data, indicatedwith a solid line, represents a quantity of network traffic over timemeasured in hours. A first statistical empirical data analysis resultf(x) obtained under lossy statistical data compression is used togenerate data representative of a first portion of the original data.Similarly, second, third and fourth empirical data analysis results,g(x), h(x), and i(x), respectively, obtained under lossy statisticaldata compression is used to generate data representative of second,third and fourth portions of the original data. Further, the originaldata that supported the generation of f(x), g(x), h(x) and i(x) has beendeleted from memory.

FIG. 6 is a block diagram illustrating physical components (e.g.,hardware) of a computing device 600 with which aspects of the disclosuremay be practiced. The computing device components described below may besuitable for the computing devices described above, including thecomputing devices 120, 124 and 132. In a basic configuration, thecomputing device 600 may include at least one processing unit 602 and asystem memory 604. Depending on the configuration and type of computingdevice, the system memory 604 may comprise, but is not limited to,volatile storage (e.g., random access memory), non-volatile storage(e.g., read-only memory), flash memory, or any combination of suchmemories.

The system memory 604 may include an operating system 605 and one ormore program modules 606 suitable for running software application, suchas one or more components supported by the systems described herein. Asexamples, system memory 604 may include lossy statistical datacompression application 624 and a decompression application 626. Theoperating system 605, for example, may be suitable for controlling theoperation of the computing device 600.

Furthermore, embodiments of the disclosure may be practiced inconjunction with a graphics library, other operating systems, or anyother application program and is not limited to any particularapplication or system. This basic configuration is illustrated in FIG. 6by those components within a dashed line 608. The computing device 600may have additional features or functionality. For example, thecomputing device 600 may also include additional data storage devices(removable and/or non-removable) such as, for example, magnetic disks,optical disks, or tape. Such additional storage is illustrated in FIG. 6by a removable storage device 609 and a non-removable storage device610.

As stated above, a number of program modules and data files may bestored in the system memory 604. While executing on the processing unit602, the program modules 606 may perform processes including, but notlimited to, the aspects, as described herein. Other program modules thatmay be used in accordance with aspects of the present disclosure mayinclude network flow monitoring applications, distributed data storageapplications, and analytics applications.

Furthermore, embodiments of the disclosure may be practiced in anelectrical circuit comprising discrete electronic elements, packaged orintegrated electronic chips containing logic gates, a circuit utilizinga microprocessor, or on a single chip containing electronic elements ormicroprocessors. For example, embodiments of the disclosure may bepracticed via a system-on-a-chip (SOC) where each or many of thecomponents illustrated in FIG. 6 may be integrated onto a singleintegrated circuit. Such an SOC device may include one or moreprocessing units, graphics units, communications units, systemvirtualization units and various application functionality all of whichare integrated (or “burned”) onto the chip substrate as a singleintegrated circuit. When operating via an SOC, the functionality,described herein, with respect to the capability of client to switchprotocols may be operated via application-specific logic integrated withother components of the computing device 600 on the single integratedcircuit (chip). Embodiments of the disclosure may also be practicedusing other technologies capable of performing logical operations suchas, for example, AND, OR, and NOT, including but not limited tomechanical, optical, fluidic, and quantum technologies. In addition,embodiments of the disclosure may be practiced within a general-purposecomputer or in any other circuits or systems.

The computing device 600 may also have one or more input device(s) 612such as a keyboard, a mouse, a pen, a sound or voice input device, atouch or swipe input device, etc. The output device(s) 614 such as adisplay, speakers, a printer, etc. may also be included. Theaforementioned devices are examples and others may be used. Thecomputing device 600 may include one or more communication connections616 allowing communications with other computing devices 650. Examplesof suitable communication connections 616 include, but are not limitedto, radio frequency (RF) transmitter, receiver, and/or transceivercircuitry; universal serial bus (USB), parallel, and/or serial ports.

The term computer readable media as used herein may include computerstorage media. Computer storage media may include volatile andnonvolatile, removable, and non-removable media implemented in anymethod or technology for storage of information, such as computerreadable instructions, data structures, or program modules. The systemmemory 604, the removable storage device 609, and the non-removablestorage device 610 are all computer storage media examples (e.g., memorystorage). Computer storage media may include RAM, ROM, electricallyerasable read-only memory (EEPROM), flash memory or other memorytechnology, CD-ROM, digital versatile disks (DVD) or other opticalstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or any other article of manufacturewhich can be used to store information and which can be accessed by thecomputing device 600. Any such computer storage media may be part of thecomputing device 600. Computer storage media does not include a carrierwave or other propagated or modulated data signal.

Communication media may be embodied by computer readable instructions,data structures, program modules, or other data in a modulated datasignal, such as a carrier wave or other transport mechanism, andincludes any information delivery media. The term “modulated datasignal” may describe a signal that has one or more characteristics setor changed in such a manner as to encode information in the signal. Byway of example, and not limitation, communication media may includewired media such as a wired network or direct-wired connection, andwireless media such as acoustic, radio frequency (RF), infrared, andother wireless media.

Aspects of the present disclosure, for example, are described above withreference to block diagrams and/or operational illustrations of methods,systems, and computer program products according to aspects of thedisclosure. The functions/acts noted in the blocks may occur out of theorder as shown in any flowchart. For example, two blocks shown insuccession may in fact be executed substantially concurrently or theblocks may sometimes be executed in the reverse order, depending uponthe functionality/acts involved.

The description and illustration of one or more aspects provided in thisapplication are not intended to limit or restrict the scope of thedisclosure as claimed in any way. The aspects, examples, and detailsprovided in this application are considered sufficient to conveypossession and enable others to make and use the best mode of claimeddisclosure. The claimed disclosure should not be construed as beinglimited to any aspect, example, or detail provided in this application.Regardless of whether shown and described in combination or separately,the various features (both structural and methodological) are intendedto be selectively included or omitted to produce an embodiment with aparticular set of features. Having been provided with the descriptionand illustration of the present application, one skilled in the art mayenvision variations, modifications, and alternate aspects falling withinthe spirit of the broader aspects of the general inventive conceptembodied in this application that do not depart from the broader scopeof the claimed disclosure.

What is claimed:
 1. A method of lossy statistical data compression,comprising: receiving time-based empirical data for a specific period oftime; storing in a memory device the first time-based empirical data;dividing the specific period of time into a plurality of time windows;performing a statistical empirical distribution analysis on a firstportion of the time-based empirical data associated with a first one ofthe plurality of time windows to obtain a first analysis result;performing a statistical empirical distribution analysis on a secondportion of the time-based empirical data associated with a next one ofthe plurality of time windows to obtain a second analysis result;determining that the first portion of time-based empirical data iswithin a threshold variance of the second portion of the time-basedempirical data; responsive to determining that that the first portion oftime-based empirical data is within a threshold variance of the secondportion of the time-based empirical data, associating the first analysisresult with the second time window and deleting from the memory devicethe second portion of the empirical data; and generating arepresentation of the empirical data associated with the first andsecond time windows from the first analysis result.
 2. The method ofclaim 1, the method further comprising additionally deleting the firstportion of the empirical data associated with the first time window. 3.The method of claim 1, the method further comprising: determining thatthe first portion of the time-based empirical data is not within thethreshold variance of the second portion of the time-based empiricaldata; responsive to determining that the first portion of the time-basedempirical data is not within the threshold variance of the secondportion of the time-based empirical data, defining the second timewindow as a new first time window and defining the second analysisresult as a new first analysis result; performing a statisticalempirical distribution analysis on the empirical data associated withthe new first time window to obtain a new first analysis result; andperforming a statistical empirical distribution analysis on theempirical data associated with a next one of the plurality of timewindows to obtain a second analysis result.
 4. The method of claim 1,wherein the method occurs in real-time.
 5. The method of claim 1,wherein the method occurs on demand.
 6. The method of claim 1, whereingenerating the representation of the empirical data associated with thefirst and second time windows from the first analysis result includesgenerating a graphical representation of the empirical data associatedwith the first and second time windows.
 7. The method of claim 1, themethod further comprising, based on a predetermined desired insight,analyzing the representation of the empirical data to obtain datarelevant to the predetermined desired insight.
 8. A method of lossystatistical data compression for network telemetry data, comprising:receiving time-based empirical network telemetry data for a specificperiod of time; storing in a memory device the time-based empiricalnetwork telemetry data; dividing the specific period of time into aplurality of time windows; performing a statistical empiricaldistribution analysis on a first portion of the empirical networktelemetry data associated with a first one of the plurality of timewindows to obtain a first analysis result; performing a statisticalempirical distribution analysis on a second portion of the empiricalnetwork telemetry data associated with a next one of the plurality oftime windows to obtain a second analysis result; determining that thefirst portion of the empirical network telemetry data is within athreshold variance of the second portion of the empirical networktelemetry data; responsive to determining that the first portion of theempirical network telemetry data is within a threshold variance of thesecond portion of the empirical network telemetry data, associating thefirst analysis result with the second time window and deleting from thememory device the empirical network telemetry data associated with thesecond time window; generating a representation of the empirical networktelemetry data associated with the first and second time windows fromthe first analysis result; and based on a predetermined desired insight,analyzing the representation of the empirical network telemetry data toobtain data relevant to the predetermined desired insight.
 9. The methodof claim 8, the method further comprising additionally deleting thefirst portion of the empirical network telemetry data associated withthe first time window.
 10. The method of claim 8, the method furthercomprising: determining that the first portion of the empirical networktelemetry data is not within a threshold variance of the second portionof the empirical network telemetry data; responsive to determining thatthe first portion of the empirical network telemetry data is not withina threshold variance of the second portion of the empirical networktelemetry data, defining the second time window as a new first timewindow and defining the second analysis result as a new first analysisresult; performing a statistical empirical distribution analysis on theempirical network telemetry data associated with the new first timewindow to obtain a new first analysis result; and performing astatistical empirical distribution analysis on the empirical networktelemetry data associated with a next one of the plurality of timewindows to obtain a second analysis result.
 11. The method of claim 8,wherein the method occurs in real-time.
 12. The method of claim 8,wherein the method occurs on demand.
 13. The method of claim 8, whereingenerating the representation of the empirical network telemetry dataassociated with the first and second time windows from the firstanalysis result includes generating a graphical representation of theempirical network telemetry data associated with the first and secondtime windows.
 14. The method of claim 8, wherein the predetermineddesired insight comprises one or both of network capacity planning andthe occurrence of peak and non-peak network traffic time periods.
 15. Amethod comprising: in real-time or on demand: receiving and storing in amemory storage device time-based data over a specific time period;dividing the specific time period into a plurality of time windows; anddetermining that data associated with two or more proximate time windowsare within a predetermined variance of one another; responsive to thedetermination: generating a mathematical function representative of thedata associated with the two or more proximate time windows; deletingthe data associated with the two or more proximate time windows from thememory storage device; and generating a representation of the deleteddata from the mathematical function.
 16. The method of claim 15, whereinthe time-based data comprises network telemetry data.
 17. The method ofclaim 15, wherein the generating the representation includes generatinga graphical representation.
 18. The method of claim 15, wherein thetime-based data comprises “Big Data.”.
 19. The method of claim 15, themethod further comprising, based on a predetermined desired insight,analyzing the representation of the empirical data to obtain datarelevant to the predetermined desired insight.
 20. The method of claim19, wherein the predetermined desired insight comprises one or both ofnetwork capacity planning and the occurrence of peak and non-peaknetwork traffic time periods.